Revisiting Label Smoothing Regularization with Knowledge Distillation
نویسندگان
چکیده
Label Smoothing Regularization (LSR) is a widely used tool to generalize classification models by replacing the one-hot ground truth with smoothed labels. Recent research on LSR has increasingly focused correlation between and Knowledge Distillation (KD), which transfers knowledge from teacher model lightweight student penalizing their output’s Kullback–Leibler-divergence. Based this observation, Teacher-free (Tf-KD) method was proposed in previous work. Instead of real model, handcrafted distribution similar guide learning. Tf-KD promising substitute for except its hard-to-tune model-dependent hyperparameters. This paper develops new teacher-free framework LSR-OS-TC, decomposes into two components: Output (OS) Teacher Correction (TC). Firstly, LSR-OS extends KD regime applies softer temperature output softmax layer. smoothing critical stabilizing hyperparameters among different models. Secondly, TC part, larger proportion assigned uniform teacher’s right class provide more informative teacher. The two-component evaluated exhaustively image (dataset CIFAR-100, CIFAR-10, CINIC-10) audio GTZAN) tasks. results showed that can improve performance independently no extra computational cost, especially several deep neural networks where ineffective. further training boost component effectiveness our strategy. Overall, LSR-OS-TC practical substitution be tuned one directly applied other compared original method.
منابع مشابه
Topic Distillation with Knowledge Agents
This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...
متن کاملSimple Square Smoothing Regularization Operators
Tikhonov regularization of linear discrete ill-posed problems often is applied with a finite difference regularization operator that approximates a low-order derivative. These operators generally are represented by banded rectangular matrices with fewer rows than columns. They therefore cannot be applied in iterative methods that are based on the Arnoldi process, which requires the regularizati...
متن کاملSmoothing speech trajectories by regularization
The articulators of human speech might only be able to move slowly, which results in the gradual and continuous change of acoustic speech properties. Nevertheless, the so-called speech continuity is rarely explored to discriminate different phones. To exploit this, this paper investigates a multiple-frame MFCC representation (that is expected to retain sufficient time-continuity information) in...
متن کاملMulti-Label Learning with Posterior Regularization
In many multi-label learning problems, especially as the number of labels grow, it is challenging to gather completely annotated data. This work presents a new approach for multi-label learning from incomplete annotations. The main assumption is that because of label correlation, the true label matrix as well as the soft predictions of classifiers shall be approximately low rank. We introduce a...
متن کاملSequence-Level Knowledge Distillation
Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2021
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app11104699